[1] 0.0782902
DATA1220-55, Fall 2024
2024-10-04
Properties of the normal distribution let us calculate the probability of observing a given value or range of values
The Central Limit Theorem: The probability distribution of the sample means from multiple samples of the same size \(n\) from the same population approximates a normal distribution as \(n\) increases (i.e. the sampling distribution)
The sampling distribution provides point estimates and confidence intervals for population parameters
A point estimate describes the location of an estimate or distribution
A confidence interval describes the scale or precision of an estimate or distribution
The confidence threshold or confidence level describes our uncertainty regarding the accuracy of our estimates
We use Z-Scores from the standard normal distribution to calculate the boundaries of our confidence interval.
We use sample statistics to describe sample populations and estimate the parameters of the study population’s sampling distribution
We also describe the variability of our measure and quantify our uncertainty regarding our estimate
We use the overlap between theoretical distributions to decide whether any differences are meaningful
Deshaun Watson, the quarterback for the Cleveland Browns, has made 176 pass attempts in the 2024 NFL season, 106 of which were completed (60.2%). How do we know if that’s good or not?
Among the 38 quarterbacks in the NFL who have attempted at least 20 passes, the average completion rate is 65.3%.
\(\mathbf{H_0}\): The “Null” Hypothesis
Represents a position of skepticism, nothing is happening here
“There is not an association between process A and B”
\(\mathbf{H_A}\): The “Alternative” Hypothesis
The complement of \(H_0\), something is happening here
“There is an association between process A and B”
Research question: Is Deshaun Watson’s pass completion rate below average for the 2024 NFL season so far?
\(H_0\): Deshaun Watson has an average pass completion rate for the 2024 NFL season (\(\hat{p}_{\operatorname{DW}} \approx p_{\operatorname{NFL}}\))
\(H_A\): Deshaun Watson has a below-average pass completion rate for the 2024 NFL season (\(\hat{p}_{\operatorname{DW}} \neq p_{\operatorname{NFL}}\))
We can use the NFL’s average (65.3%) plus the sample size (\(n = 176\) passes) to construct a sampling distribution \(\hat{p} \sim N(p=65.3, SE=3.6)\) for the average NFL quarterback’s pass completion rate.
\[ \begin{aligned} SE_p &= \sqrt{\frac{p(1-p)}{n}} \\ &= \sqrt{\frac{0.653(1-0.653)}{176}} \\ &= 0.036 \end{aligned} \]
Assuming Deshaun Watson is an average NFL quarterback, what is the probability he would have a completion rate of 60.2% or less over the last 176 passes?
The probability that Deshaun Watson would have a completion rate of 0.602 or worse, assuming he is an average NFL quarterback, is 0.078.
Is this situation unlikely enough that we can reject our null hypothesis \(H_0\)?
\(\alpha\) is also called the significance level
The probability below which you will reject the null hypothesis
Predetermined before doing hypothesis test (often \(p < 0.05\))
Also the probability of rejecting the null hypothesis when \(H_0\) is true (i.e. Type I Error or false positive rate)
In a perfect world, we will only reject \(H_0\) when \(H_A\) is true, and we will always fail to reject \(H_0\) when \(H_0\) is true.
When we reject \(H_0\) when \(H_0\) is “true” (i.e. false positive), it is called a Type I Error.
When we fail to reject \(H_0\) when \(H_A\) is “true” (i.e. false negative), it is called a Type II Error.
\(\alpha = P(\operatorname{False Positive})\)
\(\beta = P(\operatorname{False Negative})\)
\(1-\alpha = \operatorname{Confidence Level}\)
\(1-\beta = \operatorname{Power}\)
The justice system can be thought of like a hypothesis test. We assume the defendant is innocent (\(H_0\)), until we have sufficient evidence to reject the null hypothesis and accept the alternate hypothesis (\(H_A\)) that the defendant is guilty.
In a criminal trial, what type of error is committed when the jury finds the defendant guilty, when they were innocent?
What are the hypotheses?
\(H_0\): Defendant is not guilty
\(H_A\): Defendant is guilty
Type I Error
In a criminal trial, what type of error is committed when the jury finds the defendant not guilty, when they were guilty?
What are the hypotheses?
\(H_0\): Defendant is not guilty
\(H_A\): Defendant is guilty
Type II Error
Often, reducing the false positive rate increases the false negative rate (and vice versa)
There are different costs to false negatives and false positives
We’d rather let a guilty person go free than an innocent person go to jail (false negative preferable to false positive)
Metal detectors are overly sensitive so weapons aren’t missed during scans (false positive preferable to false negative)
When conducting a hypothesis test, you calculate a test statistic to assess how “extreme” your observed result is compared to the reference distribution
For normal sampling distributions (means & proportions), the test statistic is the \(Z-Score\)
Remember: \(Z=\frac{\bar{x}-\mu}{\sigma}\)
Assuming Deshaun Watson is an average NFL quarterback, what is the probability he would have a completion rate of 60.2% or less over the last 176 passes?
Assuming \(\hat{p} \sim N(p=65.3, SE=3.6)\)…
\[ \begin{aligned} Z &= \frac{\hat{p}-p}{SE} \\ &= \frac{60.2 - 65.3}{3.6} \\ &= -1.42 \end{aligned} \]
Is 1.42 standard errors below the hypothesized completion rate unusual enough to reject \(H_0\)?
A p-value is the probability of a theoretical sample having a test statistic equal to or more extreme than the one you observed, assuming that the reference distribution is “true”.
If our test statistic is \(Z=-1.42\), what is the p-value for \(Z \le -1.42\)?
If \(\alpha = 0.05\), then \(p > \alpha\) and we fail to reject the null hypothesis \(H_0: \hat{p}=0.653\). There is insufficient evidence that Deshaun Watson has a below-average completion rate.
The probability of a test statistic more extreme (greater OR lesser than) the one you observed.
If our test statistic is \(Z=-1.42\), what is the p-value for the null hypothesis \(\hat{p} = 65.3\) (\(|Z| \ge 1.42\))?
If \(\alpha = 0.05\), then \(p > \alpha\) and we fail to reject the null hypothesis \(H_0: \hat{p}=0.653\). There is insufficient evidence that Deshaun Watson has a below-average completion rate.
Deshaun Watson is averaging 4.84 yards per pass attempt in the 2024 NFL season. How do we know if that’s good or not?
Among the 38 quarterbacks in the NFL who have attempted at least 20 passes, the average yards per pass attempt approximates the distribution \(\bar{x} \sim N(\mu = 7.06, SE = 0.19)\).
Research question: Is Deshaun Watson’s average yards per pass attempt average for the 2024 NFL season so far?
\(H_0\): Deshaun Watson has an average yards per pass attempt for the 2024 NFL season (\(\bar{x}_{\operatorname{DW}} \approx \mu_{\operatorname{NFL}}\))
\(H_A\): Deshaun Watson does not have an average yards per pass attempt for the 2024 NFL season (\(\bar{x}_{\operatorname{DW}} \neq \mu_{\operatorname{NFL}}\))
\[ \begin{aligned} Z&=\frac{\bar{x} - \mu}{SE} \\ &= \frac{4.84-7.06}{0.19} \\ &= -11.68 \end{aligned} \]
Deshaun Watson’s average is 11.68 SD below the hypothesized mean. Is that unusual?
If our test statistic is \(Z=-11.68\), what is the p-value for \(|Z| \ge 11.68\)?
The probability that Deshaun Watson’s average yards would be 4.84 assuming he is an average quarterback is \(p < 0.05\).
The p-value is extremely low, so we reject \(H_0\) and accept \(H_A\) that Deshaun Watson does not have an average yards per pass attempt.
This data provides convincing evidence that Deshaun Watson is performing below average.
DATA1220-55 Fall 2024, Class 16 | Updated: 2024-10-04 | Canvas | Campuswire